Sains Malaysiana 53(4)(2024): 935-951

http://doi.org/10.17576/jsm-2024-5304-16

 

Detection of Outliers in Circular Regression Model via DFBETAcIS Statistic

(Pengesanan Outlier dalam Model Regresi Bulat melalui Statistik DFBETAcIS )

 

INTAN MASTURA RAMLEE1,2, SAFWATI IBRAHIM1,2,*, LEOW WAI ZHE3 & MOHD IRWAN YUSOFF3

 

1Institute of Engineering Mathematics, Universiti Malaysia Perlis, Pauh Putra Main Campus, 02600 Arau, Perlis, Malaysia

2Centre of Excellence for Social Innovation and Sustainability (COESIS), Universiti Malaysia Perlis, 02600 Arau, Perlis, Malaysia

3Faculty of Electrical & Technology Engineering, Universiti Malaysia Perlis, Pauh Putra Main Campus, 02000 Arau, Perlis, Malaysia

 

Diserahkan: 4 Julai 2023/Diterima: 1 Mac 2024

 

Abstract

The outlier issues in circular regression models have recently received much attention. The presence of outliers may cause the sign and magnitude of regression coefficients to vary, resulting in inaccurate model development and incorrect prediction. Many methods for detecting outliers in a circular regression model have been proposed in previous studies such as COVRATIO, D, M, A, and Chord statistics, but it is suspected that they are not very successful in the presence of multiple outliers in a data set since the masking and swamping is not considered in their studies. This study aimed to develop an outlier detection procedure using DFBETAc  statistic for circular cases, where this new statistic will investigate and identify multiple outliers in the Jammalamadaka and Sarma circular regression model (JSCRM) by considering masking and swamping effect. Monte Carlo simulations are used to determine the corresponding cut-off point and the power of performance is investigated. The performance of the proposed statistic is evaluated by the proportion of detected outliers and the rate of masking and swamping. The simulation procedure is applied at 10% and 20% contamination levels for varying sample sizes. The results show that the proposed DFBETAcIS  statistic for JSCRM successfully detect the outliers. For illustration purposes, this process is applied to wind direction data.

 

Keywords: Circular regression model; DFBETAc; outlier  

 

Abstrak

Isu data terpencil dalam model regresi bulat baru-baru ini banyak mendapat perhatian. Kehadiran data terpencil boleh menyebabkan tanda dan magnitud pekali regresi berubah, mengakibatkan pembangunan model yang tidak tepat dan ramalan yang salah. Banyak kaedah untuk mengesan data terpencil dalam model regresi bulat telah dicadangkan dalam kajian sebelum ini seperti statistik COVRATIO, D, M, A dan Chord tetapi dipercayai bahawa kaedah tersebut tidak begitu berjaya dengan kehadiran berbilang data terpencil dalam set data kerana litupan dan limpahan tidak diambil kira dalam kajian mereka. Kajian ini bertujuan untuk membangunkan prosedur pengesanan data terpencil menggunakan statistik DFBETAc  untuk kes bulatan dengan statistik baharu ini akan mengkaji dan mengenal pasti berbilang data terpencil dalam model regresi bulat Jammalamadaka dan Sarma (JSCRM) dengan mengambil kira kesan litupan dan limpahan. Simulasi Monte Carlo digunakan untuk menentukan titik potong yang sepadan dan kuasa prestasi dikaji. Prestasi statistik yang dicadangkan dinilai oleh perkadaran data terpencil yang dikesan dan kadar litupan dan limpahan. Prosedur simulasi digunakan pada tahap pencemaran 10% dan 20% untuk sampel saiz yang berbeza. Keputusan menunjukkan statistik DFBETAcIS  yang dicadangkan untuk JSCRM berjaya mengesan data terpencil. Untuk tujuan ilustrasi, proses ini digunakan pada data arah angin.

 

Kata kunci: Data terpencil; DFBETAc; model regresi bulat

 

RUJUKAN

Abuzaid, A.H. 2020. Detection of outliers in univariate circular data by means of the outlier local factor (LOF). Statistics in Transition New Series 21(3): 39-51.

Abuzaid, A.H. 2010. Some problems of outliers in circular data. Doctoral dissertation, University of Malaya.

Abuzaid, A.H., Mohamed, I.B. & Hussin, A.G. 2009. A new test of discordancy in circular data. Communications in Statistics-Simulation and Computation 38: 682-691.

Abuzaid, A.H., Hussin, A.G., Rambli, A. & Mohamed, I. 2012. Statistics for a new test of discordance in circular data. Communications in Statistics-Simulation and Computation 41: 1882-1890.

Alkasadi, N.A., Ibrahim, S., Abuzaid, A. & Yusoff, M.I. 2019. Outlier detection in multiple circular regression model using DFFITC statistic. Sains Malaysiana 48(7): 1557-1563.

Alkasadi, N.A., Abuzaid, A.H., Ibrahim, S. & Yusoff, M.I. 2018. Outliers detection in multiple circular regression model via DFBETAc statistic. International Journal of Applied Engineering Research 13: 9083-9090.

Alkasadi, N.A., Ibrahim, S., Ramli, M.F. & Yusoff, M.I. 2016. A comparative study of outlier detection procedures in multiple circular regression. In AIP Conference Proceedings 1775: 030032.

Barnett, V. & Lewis, T. 1994. Outliers in Statistical Data. New York: John Wiley and Sons. 

Belsley, D.A., Kuh, E. & Welsch, R.E. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons.

Binkley, S.A. 1990. The Clockwork Sparrow: Time, Clocks, and Calendars in Biological Organisms. New Jersey: Prentice Hall.

Chambers, R., Hentges, A. & Zhao, X. 2004. Robust automatic methods for outlier and error detection. Journal of the Royal Statistical Society: Series A (Statistics in Society) 167: 323-339.

Chatterjee, S. & Hadi, A.S. 1988. Impact of simultaneous omission of a variable and an observation on a linear regression equation. Computational Statistics & Data Analysis 6: 129-144.

Collett, D. 1980. Outlier in circular data. Journal of the Royal Statistical Society Series C: Applied Statistics 29(1): 50-57.

Cook, R.D. 1977. Detection of influential observation in linear regression. Technometrics 19:1 5-18.

Cousineau, D. & Chartier, S. 2010. Outliers detection and treatment: A review. International Journal of Psychological Research 3: 58-67.

Downs, T. 1974.  Rotational angular correlation. In Biorhythms and Human Reproduction, edited by Ferin, M., Halberg, F. & van der Wiele, L. New York: Wiley. pp. 97-104.

Fisher, N.I. & Lee, A.J. 1992. Regression models for an angular response. Biometrics 48(3): 665-677.

Follmann, D.A. & Proschan, M.A. 1999. A simple permutation‐type method for testing circular uniformity with correlated angular measurements. Biometrics 55(3): 782-791.

Gould, A.L. 1969. A regression technique for angular variates. Biometrics 25(4): 683-700.

Hrushesky, W.J.M. 1985. Circadian timing of cancer chemotherapy. Science 228: 73-75.

Hussin, A.G., Fieller, N.R.J. & Stillman, E.C. 2004. Linear regression model for circular variables with application to directional data. Journal of Applied Science and Technology 9(1 & 2): 1-6.

Ibrahim, S. 2013. Some outlier problems in a circular regression model. Doctoral dissertation, Fakulti Sains, Universiti Malaya.

Ibrahim, S., Rambli, A., Hussin, A.G. & Mohamed, I. 2013. Outlier detection in a circular regression model using COVRATIO statistic. Communications in Statistics-Simulation and Computation 42(10): 2272-2280.

Jammalamadaka, S. & Sarma, Y. 1993. Circular regression. Statistical Science and Data Analysis 34: 109-128.

Jha, J., Biswas, A. & Cheng, T.C. 2022. Trimmed estimator for circular–circular regression: Breakdown properties and an exact algorithm for computation. Statistics 56(2): 375-395.

Johnson, R.A. & Wehrly, T.E. 1978. Some angular-linear distributions and related regression models. Journal of the American Statistical Association 73: 602-606.

Jones, M.C. & Silverman, B.W. 1989. An orthogonal series density estimation approach to reconstructing positron emission tomography images. Journal of Applied Statistics 16: 177-191.

Lowrey, P.L., Shimomura, K., Antoch, M.P., Yamazaki, S., Zemenides, P.D., Ralph, M.R., Menaker, M. & Takahashi, J.S. 2000. Positional syntenic cloning and functional characterization of the mammalian circadian mutation tau. Science 288(5465): 483-492.

Lund, U. 1999. Least circular distance regression for directional data. Journal of Applied Statistics 26: 723-733.

Mackenzie, J.K. 1957. The estimation of an orientation relationship. Acta Crystallographica 10: 61-62.

Mardia, K. 1975. Statistical of directional data (with discussion). Journal of the Royal Statistical Society 37: 390.

Meilán-Vila, A., Crujeiras, R.M. & Francisco-Fernández, M. 2021. Nonparametric estimation of circular trend surfaces with application to wave directions. Stochastic Environmental Research and Risk Assessment 35(4): 923-939.

Mohamed, I.B., Rambli, A., Khaliddin, N. & Ibrahim, A.I.N. 2016. A new discordancy test in circular data using spacings theory. Communications in Statistics-Simulation and Computation 45: 2904-2916.

Mokhtar, N.A., Zubairi, Y.Z., Hussin, A.G. & Moslim, N.H. 2019. An outlier detection method for circular linear functional relationship model using covratio statistics. Malaysian Journal of Science 38(Special Issue 2): 46-54.

Moore-Ede, M.C., Sulzman, F.M. & Fuller, C.A. 1982. The Clocks that Time Us: Physiology of the Circadian Timing System. Massachusetts: Harvard University Press.

Rambli, A., Yunus, R.M., Mohamed, I. & Hussin, A.G. 2015. Outlier detection in a circular regression model. Sains Malaysiana 44(7): 1027-1032.

Rivest, L.P. 1997. A decentered predictor for circular-circular regression. Biometrika 84: 717-726.

Rousseeuw, P.J. & Leroy, A.M. 2005. Robust Regression and Outlier Detection. New York: John Wiley & Sons.

Shearman, L.P., Sriram, S., Weaver, D.R., Maywood, E.S., Chaves, I., Zheng, B., Kume, K., Lee, C.C., van der, G.T.J, Horst, Hastings, M.H. & Reppert, S.M. 2000. Interacting molecular loops in the mammalian circadian clock. Science 288(5468): 1013-1019.

Stephens, M.A. 1979. Vector correlation. Biometrika 66(1): 41-48.

Weir, I.S. & Green, P.J. 1994. Modelling data from single-photon emission computerized tomography. Journal of Applied Statistics 21: 313-337.

 

*Pengarang untuk surat-menyurat; email: safwati@unimap.edu.my

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

sebelumnya